Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] Big refactor of examples and documentation #509

Merged
merged 17 commits into from
Jul 14, 2023

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Jul 10, 2023

This PR is an attempt to refactor most of the examples and adapt the documentation accordingly

Added also a small enhancement for the reward trainer to directly call prepare_model_for_int8_training method under the hood.

TODOs:

Examples:

  • Add a script for SFTTrainer
  • Modify the docs accordingly for SFTTrainer
  • Add a script for RMTrainer
  • Modify the docs accordingly for RewardTrainer
  • Add one or two scripts for PPOTrainer
  • Modify the docs accordingly for PPOTrainer
  • Try to reproduce stack llama only using the new scripts - let's move all research projects inside a dedicated folder
  • Deal with examples/summarization folder

Removing all scripts related to merging adapter weights since in PEFT you can simply do:

model = model.merge_and_unload()

Documentation

To discuss

cc @lvwerra @vwxyzjn

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 10, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada marked this pull request as ready for review July 11, 2023 09:19
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks really great! Left a few small comments and two main points:

Doc structure

What do you think about simplifying/renaming the sections a bit in the docs:

  • API

    • Model Classes
    • Trainer Classes
    • Reward Model Training
    • Supervised Fine-Tuning
    • Best-of-N Samppling
  • Examples

    • Sentiment Tuning
    • Training with PEFT
    • Detoxifying LLMs
    • StackLlama
    • Multi-Adapter Training

I think we could also do a better job at giving an overview of the docs in the landing page on the docs.

Examples structure

For consistency I would call the reward examples folder reward_trainer. Having said that I wonder if we shouldn't further simplify the examples only to have these folders:

  • research_projects
  • notebooks
  • scripts
    Most of the folders only have 1-2 scripts anyway.

@@ -28,7 +28,7 @@


if is_peft_available():
from peft import PeftModel, get_peft_model
from peft import PeftModel, get_peft_model, prepare_model_for_int8_training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about int4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for both, currently in PEFT there is a method called prepare_model_for_kbit_training, but we did not made a release yet (they're the same method, just changed the name).

examples/reward_modeling/train_reward_model.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @lvwerra for the extensive review, should be all good now!

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I think we can merge soon! A few small nits:

Short Sections Names

It would be really nice if we could make all the section heads single line in the sidebar. Suggestions:

  • Training your own reward model -> Reward Model Training
  • Best of N sampling - Better model output without reinforcement learning -> Best of N Sampling
  • Multi Adapter RL (MARL) - a single base model for everything -> Multi Adapter RLHF or Multi Adapter RL?
  • Using Llama with TRL -> Training StackLlama

Section Capitalisation

Also we should be consistent in capitalisation. I am in favour of capitalising the sections heads. E.g. Reward Model Training rather than Reward model training. What do you think?

SFTTrainer

It seems to be missing from the Trainer Classes docs page.

Comment on lines +16 to +17
API documentation:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding 1 sentence here per bullet point for people who don't know e.g. what an SFTTrainer could do or what a StackLlama is :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@younesbelkada
Copy link
Contributor Author

Looks much cleaner now ! Addressed your comment @lvwerra !

@younesbelkada younesbelkada merged commit 5c7bfbc into main Jul 14, 2023
@younesbelkada younesbelkada deleted the refactor-examples branch July 14, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants